Recommender System using Neural Networks model

  • Omar El-Dessouky
  • Seif Rady

Problem Statement

Recommendation systems work on creating models that predict user preferences and recommend items/movies based on their preferences to create a more personalized experience.
Explicit data: direct and quantitative data collected from users like ratings for movies. The problem is being rare.
Implicit data: collected indirectly from user interactions e.g. clicking on a movie. Data is abundant and allows us to tailor recommendations in real time, with every click and interaction. However, every interaction is assumed to be positive and we are unable to capture negative preference from users.

Dataset: MovieLens 20M Dataset

MovieLens 20M Dataset is a large dataset which describes ratings for 27278 movies, 138493 user, and a total of 20,000,263 ratings. Check dataset


Source: https://i.imgur.com/oNJnLqU.png


Input/Output Examples

The model takes user ratings for movies as an input and recommends movies for the user based on their ratings.



State of the art


  • Neural Collaborative Filtering Implicit Model
    • Produces 86% using hit@10 metric
    • Uses Matrix Factorization
    • Uses Implicit Data

  • Neural Collaborative Filtering Explicit Model
    • Produces 0.88 using RMSE metric
    • Uses Matrix Factorization
    • Uses Explicit Data

Orignial Models from Literature

NCF Implicit Model

This model takes one hot encoded user and movie vectors and feed them into User Embedding Layer and Movie Embedding Layer Respectively. Then, the concatenated embeddings are mapped into prediction vector.



NCF Explicit Model

The inputs to the model are the one-hot encoded user and item vectors. Input vectors are fed to the user embedding and item embedding respectively. Then, the model finds the similarity between the embeddings. The Model contains a Dense layer with Relu function for calculating the similarity followed by Dropout Layer followed by another Dense Layer with Relu Function. The final output of the model is the predicted score in a list of movies with corresponding expected rating.



Proposed Updates

Update #1: Used a new model mixing both implicit data and explicit data

Current Models either use implicit data or explicit data. Our model mixes both implicit and explicit feedback models.
For some user, the new model sorts the most recommended movies for this user using both implicit model (Neural Collaborative Filtering model) and the explicit feedback model (Matrix Factorization Model).
Weights are then added to each model's output and based on these weights, a new output is generated which is the new recommended list of movies.



Update #2: Hyper-parameter Tuning

We tuned some hyper-parameters such as the following:
To change the ratings from explicit values to implicit values, previously, any positive value was considered to be 1 and any movie not rated had a value of 0. We tried different things for this. For example, we converted values higher than 3 to 1 and lower than 3 or not rated to 0.

Results

Using 2,000,000 ratings

  • The NCF Implicit Model scored 81% using hit@10 metric.
  • The NCF Explicit Model scored 80% using hit@10 metric.
  • The Hybrid Model scored 90% by taking the first 6 movies recommended by the NCF Implicit model and the first 4 movies recommended by the NCF Explicit model using hit@10 metric.

Technical report

Here you will detail the details related to training, for example:

  • Programming framework: Python, PyTorch, Keras
  • Training hardware: Google Collab and AWS EC2
  • Training time: Around 4 hours
  • Number of epochs: 5 Epochs for NCF Model and 50 Epochs for Matrix Factorization Model
  • Time per epoch: Around 30 minutes

Conclusion

Merging explicit data increases the probability of a user giving high rate for recommended movies. Thus, this model can be better for commercial recommender systems where we want the user to buy an item not only interact with it. The merged model balances high hit ratio with better likability index through higher ratings